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1. INTRODUCTION 

It is often observed that in healthcare dataset binary classification majority of the dataset are 
imbalanced. Especially in healthcare dataset the critical condition records are significantly less as compared 
to normal condition records. The critical conditions like heart attack condition, asthama attack and 
hypoglycemia conditions occur very rarely. Thus giving rise to highly imbalanced dataset. The imbalanced 
dataset handling if not taken into consideration and if not balanced then it may often lead to wrong or 
inappropriate conclusions. For this dataset balancing plays very critical and important role. Also as compared 
to only classifier base comparative analysis ensemble technique is always a preferred one. 

Diabetes is considered one of fastest spreading chronic disease. The experimentation work was 
carried out on diabetes dataset for hypoglycemia detection. Hypoglycemia means the lowering of blood 
glucose level which may lead to severe complications for an individual [1]. We have made and attempt for 
analyzing detection of hypoglycemia using superficial body parameter values. Usually it is observed that to 
get the blood glucose level (BGL) value often a pricking method is used. An attempt was made to find out 


Journal homepage: http://ijeecs.iaescore.com 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 Oo 927 


hypoglycemia occurrence in diabetes people using their superficial body parameters [2], [3]. For any type of 
dataset for accurate decision-making machine learning has played and important role. Various classifiers and 
the performance metrics available have given the researcher a good scope for healthcare analysis [4]-[6]. The 
literature survey is mainly focused on imbalanced dataset handling, evaluation methods using machine 
learning and ensemble techniques. In one of the paper the authors mainly focused on handling the 
imbalanced dataset problem. The SVM-SMOTE method was used for dataset balancing. A hybrid model 
implementation was done with KFold cross validation with K=5 for evaluating the classifier performance. A 
hybrid model using genetic algorithm-based feature selection with stacking method of ensemble technique 
was used [7]. 

In one of the related work the author has used two oversampling techniques for dataset imbalance viz. 
Synthetic minority oversampling technique (SMOTE) and adaptive synthetic (ADASYN). The pima dataset 
was considered for implementation with 4 different classifiers random forest, logistic regression, XA boost and 
support vector machine (SVM). A significant growth in the metric values was reflected after using SMOTE and 
ADASYN oversampling technique for imbalanced dataset [8]. Binary classification using imbalanced and 
balanced techniques was discussed. ensemble of classifiers based on multiobjective genetic sampling for 
imbalanced classification (E-MOSAIC) method was elaborated. This was a genetic algorithm based innovative 
way to avoid the loss of information due to under sampling and to avoid repeated information due to 
oversampling [9]. imbalanced dataset often lead to lowered performance. The authors considered an intrusion 
detection system datasets for machine learning analysis. 6 different IDS were considered where high 
imbalanced dataset was observed. SMOTE was used for balancing with improved performance [10]. 

A novel way of first preprocessing the dataset for imbalance and converting to balanced dataset is 
proposed. Then this balanced dataset was given as training dataset to ensemble classifier model. Classifier 
ensemble hybrid approach was used with two phases viz. Resampling dataset using SMOTE for balancing 
and StackingC classifier ensemble. The general classifier approach was then compared with the hybrid 
approach proposed which showed significant rise in AUC score [11]. Also an unique ensemble strategy for 
medical diagnosis was given to SMOTE by using cross validation technique. In the last phase weighted 
majority voting strategy was used to prove the efficiency of the ensemble proposed [8]. Further the study was 
to explore more on performance metrics in machine learning. Just relying on accuracy score was not a good 
choice. In one of the article authors have given the comparative study of different metrics used in machine 
learning for imbalanced dataset. The difference in majority and minority class affects the metrics like 
accuracy and Fl-score, while s area under the receiver operating characteristic curve metric shows no affect 
[12]. Different ensemble approach for machine learning viz. bagging, breiman boosting, and freund boosting. 
Imbalanced dataset are mainly to be taken into consideration. Different metrics for imbalanced dataset were 
discussed and experimented with. AUC was considered to be most robust [13]. Finally talking about the 
evaluation methods, cross validation method proves to give good results. Cross validation method is often a 
good choice when ambiguity in selecting train and test dataset arises. Details discussion on cross validation 
techniques was done on different dataset based classifier implementations. The cross validation using KFold 
and Stratified KFold was implemented and compared with without cross validation classifier metric values. 
Different K value evaluation was done [14]. 


2. METHOD 

The very first step is dataset generation. Dataset generation in this case was the real time data inputs 
taken from 13 different patients. The details for dataset are shared below. The mentioned dataset is also 
available on IEEE data port [15]. 


2.1. Dataset 

Data collection was done from diabetic patients in real time for the features selected. 13 different 
patients of different age groups recorded readings using calibrated wearable devices and apparatus. The 
continuous glucose monitoring kit named Freestyle librePro sensor was used to get the blood glucose level 
readings. While the rest of the parameter readings was taken from riversong wave O2 colored smart band. 
The Librepro device and riversong wave bands were calibrated together for time settings [15]. 
Dataset Features: Dataset with around 70000 record [15] was ready having following features, 
a) Including diabetic and non diabetic people record. 
b) Structured dataset with features used for experimentation as Diastolic BP, Systolic BP, Heart Rate, 

Shivering, Body Temperature, age, hypoglycemia detected and prehypoglycemia. 

c) While dataset fields like blood glucose level (BGL), SPO2, Sweating and diabetic/nondiabetic were not used. 
d) Hypoglycemia was used as target field. 
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2.2. Ensemble approach used 

It is often observed that healthcare dataset is usually imbalanced. The number of records for disease 
diagnosed is less than the number of records where disease is not diagnosed. Therefore, an Ensemble 
approach was used for Hypoglycemia detection analysis using superficial body parameter readings [16], [17]. 


2.3. Implementation 

Ensemble approach implemented was a three step method given in Figure 1. 
Step 1: Using Oversampling method for converting imbalanced dataset to balanced dataset [15]. Dataset 
balancing was done by oversampling technique SMOTE [16] and ADASYN [17]-[20]. SMOTE The original 
imbalanced dataset had total 70943 records. The major imbalance was found in original dataset viz. 
imbalance for hypoglycemia detected field. The original dataset had following imbalance count Total record: 
70943, Hypoglycemia detected count records: 9055, Hypoglycemia not detected count records: 61888. 


Step 2: Evaluation done by 4 different ways viz. crossfold validation, stratified crossfold validation, train-test 
and repeat train test method [21], [22]. 

Cross Validation: The cross-validation technique for above 5 models using machine learning was done with 
KFold and Stratified KFold method with K value being 10 and 20. The stratified KFold is considered to be 
one of the ways to handle dataset imbalance. 

Train Test: The train test variation is done by maintaining the train: Test ratio to 7:3 and 8:2. The repeat train 
test strategy is adopted to get more accurate results. The repeat train test value was set to 10 and 20 and then 
classifier evaluation was done for 7:3 and 8:2 ratios. 


Step 3: Average StackingC strategy was used at last stage to come to conclusion. 


Figure 1. Ensemble approach for hypoglycemia detection 


The ensemble approach with cross validation and train test technique was used for following five 
supervised model or classifier [23], [24] implementations viz. K-nearest neighbor (KNN), SVM, random 
forest, Naïve Bayes and logistic regression. The cross validation technique makes the result more concrete. 
Also the train test method with repeat with every time different combination will remove the possibility of 
skipping any important record. 


2.4. Comparative analysis 

Comparative analysis is done for all above datasets with all different experimentation strategies. 
Machine learning experimentation was done based considering metrics [25]-[27] AUC score, accuracy, F1 
score, precision, recall, train time and test time. As per the literatures studied [12]-[14] ROC_AUC_Score is a 
robust metric for imbalanced dataset as compared to other metrics like accuracy_score, Fl-Score, precision 
and recall. 


3. RESULTS AND DISCUSSION 
The two main categories for experimentation were imbalanced and balanced dataset. The target 
feature is fixed to Hypoglycemia detected which is binary and which indicated Hypoglycemia detected is 1 
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and vice versa. Different dataset and strategies were used for result generation. Dataset used were either 
original imbalanced dataset or balanced dataset by SMOTE or ADASYN method. Strategy can be either of 
cross validation, stratified crossfold, train-test or repeat train-test. 


3.1. Imbalanced dataset experimentation 

The 4 results shown in Figure 2 and 3 shows that KNN, random forest, Naïve Bayes and logistic 
regression are the four classifiers which gives good metric values for imbalanced dataset and AUC score is 
more of a balanced metric. The balancing was done for hypoglycemia detected record imbalance. Total 
records in the dataset were 70.943 in which hypoglycemia detected were 9.055 records and remaining 61.888 
were no hypoglycemia detected records. Which was a severe imbalance. Therefore oversampling was done 
on the original imbalanced dataset using SMOTE and ADASYN method. 
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Figure 2. Imbalanced dataset experiment analysis 
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Figure 3. Imbalanced dataset experiment analysis only for diabetic records 


3.2. SMOTE Balanced Dataset experimentation 

SMOTE oversampling method was used tobalance the dataset [28]-[30]. After SMOTE 
implementation the total number of records were 86.792 where 43396 were hypoglycemia detected records 
and 43.396 were no hypoglycemia detected records. Results were obtained as follows, SMOTE balanced 
dataset experiment analysis as shown in Figure 4. 


3.3. ADASYN balanced dataset experimentation 

ADASYN oversampling analysius was also done for getting balanced dataset [31]-[33]. After 
ADASYN implementation the total number of records were 86.704 where 43.308 were hypoglycemia 
detected records and 43.396 were no hypoglycemia detected records. Results were obtained as follows, 
ADASYN balanced dataset experiment analysis as shown in Figure 5. 
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Figure 5. ADASYN balanced dataset experiment analysis 


3.4. Comparative analysis 

Three different considerations were done to come to conclusion. Comparative analysis of the metric 
values for different classifiers as shown in Figure 6. 
Consideration 1: Average balanced and imbalanced based. The first consideration used avaraging on the 
scores obtained from different classifier metrics. All above observations conclude that the classifiers give 
comparatively similar metric values. The following graph shows the result of Average StackingC ensemble 
approach used. The average of all metric values is taken for all different experimentations done. Therefore 
we can see that the classifier based on its metric evaluation has following ratings. The Table 1 shows that the 
random forest classifier has highest metric values and therefore ranked 1. 
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Figure 6. Comparative analysis of the metric values for different classifiers 
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Table 1. Classifier ratings according to metric evaluation consideration 


Sr.no Classifier Relevance Rank 
1 Random Forest Highest Metric values 1 
2 KNN Mid Metric values 2 
3 Logistic Regression Mid Metric values 3 
4. Naive Bayes Mid Metric values 4 
5 SVM Lowest Metric values 5 


Consideration 2: Confusion matrix based- the second consideration was taking into consideration false 
negative counts from confusion matrixIn healthcare anaysis along with true positive (TP) equal weightage is 
to be given to false negative (FN) count. The false negative count tells that though the hypoglycemia state is 
existing but it is not diagnosed, which is the state which may degrade the accuracy. The Table 2 shows the 
true positive, and false negative count for different classifier implementations with different strategies. 


Table 2. Classifier confusion matrix true postive and false negative values obtained 
Classifier True Positive and False Imb-All Dataset Imb-Only Diabetes ADASYN Balanced SMOTE Balanced 


Negative counts Dataset Dataset Dataset 
RF TP 95.8 95.8 99.9 98.2 
FN 4.2 4.2 0.1 1.8 
KNN TP 93:5 93.5 99.6 97.5, 
FN 6.5 6.5 0.4 2.5 
SVM TP 92.8 92.8 95.3 97.5, 
FN 7.2 12 4.7 2.5 
LR TP 91.8 91.8 89.5 94.9 
FN 8.2 8.2 10.5 5.1 
NB TP 90.2 90.2 81.8 94 
FN 9.8 9.8 19.2 6 


Table 3 shows the ranking based on the highest true positive and lowest false negative count. 
Random forest classifier is at the rank 1 and Naïve Bayes at the last. Random forest being iterative decision 
tree ensembling give more true poistive counts. 


Table 3. Classifier ratings according to confusion matrix true postive and false negative consideration 


Sr.no Classifier Relevance Rank 
1 Random Forest Highest Preference 1 
2 KNN Mid Preference 2 
3 Logistic Regression Mid Preference 3 
4 SVM Mid Preference 4 
5 Naive Bayes Lowest Preference 5 


Consideration 3: Train and test time based-the third consideration was taking time into consideration. 
The average train time and test time is calculated based on all the experimentations done for following 
categories, 
1. All imbalanced dataset implementations 
2. ADASYN balanced dataset 
3. SMOTE balanced dataset 

Table 4 gives the glimpse of train-test timing required for different classifier implementations. 
Whenever any implementations are to be done the timing for execution is the factor which cannot be neglected. 
The Table 4 shows Naive Bayes having least time of execution and SVM having the maxmum time. 


Table 4. Classifier ratings according to time consideration 


ADASYN SMOTE Imbalanced 

Classifier Average Train Average Test Average Train Average Test Average Train Average Test 

Time Time Time Time Time Time 
KNN 3.396 11.439 4.424875 11.99013 2.532 8.678 
SVM 25.877 0.856 33.4774375 0.877875 19.527 0.679 
Random Forest 5.591 0.459 5.0714375 0.404625 4.130 0.321 
Naïve Bayes 0.562 0.077 0.6616875 0.083125 0.437 0.059 
Logistic 
Regression 7.591 0.050 8.733625 0.062125 6.749 0.044 
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3.5. Discussions 

The above implementation and results show that only classifier-based evaluation is not sufficient for 
any conclusion. Especially when analysis is to be done in healthcare sector then ensemble approach plays 
very important role. The different metric scores with false negative counts give very clear idea that which 
classifier suits well for the analysis to be done. Also, time consideration can be done if is acute critical 
diagnosis in healthcare. An Ensemble approach is the way to get more accurate results. 


3.6. Comparison with the existing implementations 

The implementation done by authors in paper [25] only focuses on different classifier cumulative 
scores, while the proposed ensemble approach takes into consideration different machine learning classifiers, 
evaluation methods like cross fold, stratified cross fold, train test and repeat train test. This multi-step 
ensemble gives more accurate results. Also, in the articles [24]-[27] only use of different classifier and 
performance metrics is done, while in the proposed method not only the performance metric score but two more 
considerations are focused on viz. false negative counts and execution time. The stacking C approach to get 
final metric scores, confusion matrix for false negative counts and time considerations makes the results more 
concrete and clearer for analysis. 


4. CONCLUSION 

Machine learning implementation for 5 different classifier viz. random forest, KNN, logistic 
regression, Naïve Bayes and SVM was thus done using innovative ensemble method of machine learning. 
SMOTE and ADASYN oversampling algorithms was used for balancing the imbalance due to hypoglycemia 
detected attribute. The results were concluded considering 3 important points viz. experimentation metric 
analysis, train test time comparison and TP and FN values. The 3-stage ensemble experimentation concluded 
that random forest, KNN, logistic regression and Naïve Bayes were good to be considered for hypoglycemia 
detection. Random forest classifier found to be the most stable classifier for all strategies. Also, we can conclude 
that FN count should be taken into consideration as disease not detected correctly may lead to serious 
conditions. Considering FN count it was also found that the SMOTE oversampling method gives more accuracy 
in terms of less FN counts. An ensemble approach gives a better understanding of the implementation done 
which further helps in proper decision making too. So rather than only comparing the classifier values obtained 
for different performance metrics, it is always better to use appropriate ensemble approach best suited for 
application. In healthcare data analytics major concern is imbalanced dataset which can be an important 
consideration for an ensemble approach for healthcare data analytics. 
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